Learning Objectives: In this vignette, you will learn how to implement a kinship model that combines both two-sex and time-varying approaches with multiple stages. You will understand how to incorporate sex-specific demographic rates that change over time, analyze the effects of demographic transitions on kinship structures by sex and stage, and explore applications with different stage variables such as parity and education.
In this final vignette, we integrate all elements from previous tutorials into the most comprehensive kinship model available: a two-sex time-varying multi-state model. This framework simultaneously accounts for:
Building on Caswell and colleagues’ theoretical developments (2019, 2020, 2022; 2021), this approach provides
unprecedented analytical power for understanding complex family
structures. We’ll implement this comprehensive model using the
kin_multi_stage_time_variant_2sex function, exploring two
examples: one using parity and another using educational attainment to
demonstrate how these models illuminate contemporary family dynamics in
ways simpler models cannot.
If you haven’t already installed the required packages from the previous vignettes, here’s what you’ll need:
# Install basic data analysis packages
install.packages("dplyr") # Data manipulation
install.packages("tidyr") # Data tidying
install.packages("ggplot2") # Data visualization
install.packages("readr") # Data import
install.packages("knitr") # Document generation
install.packages("data.table")# Efficient data handling
install.packages("Matrix") # Matrix operations
# Install DemoKin
# DemoKin is available on CRAN (https://cran.r-project.org/web/packages/DemoKin/index.html),
# but we'll use the development version on GitHub (https://github.com/IvanWilli/DemoKin):
install.packages("remotes")
remotes::install_github("IvanWilli/DemoKin")Let’s load the necessary packages for our analysis:
rm(list = ls())
library(dplyr) # For data manipulation
library(tidyr) # For restructuring data
library(ggplot2) # For visualization
library(readr) # For reading data
library(knitr) # For document generation
library(DemoKin) # For kinship analysis
library(Matrix) # For matrix operations
library(tictoc) # For timing operations
options(dplyr.summarise.inform = FALSE) # hide summarise outputThe kin_multi_stage_time_variant_2sex function in the
DemoKin package allows us to implement a comprehensive
kinship model that accounts for sex, time, and stage simultaneously.
This function computes stage-specific kinship networks across both sexes
for an average member of a population (focal) under time-varying
demographic rates.
The model estimates:
The two-sex time-varying multi-state model requires several inputs:
In the following sections, we’ll explore two examples of this model using different stage variables: parity and education.
In our first example, we’ll use parity (number of children already born) as our stage variable. We’ll use data from the United Kingdom ranging from 1965 to 2022, sourced from the Human Mortality Database and the Office for National Statistics.
Due to data limitations, we make some simplifying assumptions:
Let’s load the pre-processed UK data:
# Load pre-processed data for UK
F_mat_fem <- Female_parity_fert_list_UK # Female fertility by parity
F_mat_male <- Female_parity_fert_list_UK # Male fertility (same as female due to androgynous approximation)
T_mat_fem <- Parity_transfers_by_age_list_UK # Female parity transitions
T_mat_male <- Parity_transfers_by_age_list_UK # Male parity transitions (same as female)
U_mat_fem <- Female_parity_mortality_list_UK # Female survival
U_mat_male <- Male_parity_mortality_list_UK # Male survival
H_mat <- Redistribution_by_parity_list_UK # Birth redistribution matricesThese lists contain period-specific demographic rates:
U_mat_fem/U_mat_male: Lists of matrices
containing survival probabilities by age (rows) and parity (columns)
from 1965-2022F_mat_fem/F_mat_male: Lists of matrices
containing fertility rates by age and parityT_mat_fem/T_mat_male: Lists of transition
matrices showing probabilities of moving between parity statesH_mat: List of matrices that redistribute newborns to
age-class 1 and parity 0Now let’s implement the two-sex time-varying multi-state model with parity as the stage variable:
# Define time period and parameters
no_years <- 40 # Run the model for 40 years (1965-2005)
# Run the model
kin_out_1965_2005 <-
kin_multi_stage_time_variant_2sex(
U_list_females = U_mat_fem[1:(1+no_years)], # Female survival matrices
U_list_males = U_mat_male[1:(1+no_years)], # Male survival matrices
F_list_females = F_mat_fem[1:(1+no_years)], # Female fertility matrices
F_list_males = F_mat_male[1:(1+no_years)], # Male fertility matrices
T_list_females = T_mat_fem[1:(1+no_years)], # Female transition matrices
T_list_males = T_mat_fem[1:(1+no_years)], # Male transition matrices
H_list = H_mat[1:(1+no_years)], # Birth redistribution matrices
birth_female = 1 - 0.51, # UK sex ratio (49% female)
parity = TRUE, # Stages represent parity
output_kin = c("d", "oa", "ys", "os"), # Selected kin types
summary_kin = TRUE, # Produce summary statistics
sex_Focal = "Female", # Focal is female
initial_stage_Focal = 1, # Focal starts at parity 0
output_years = c(1965, 1975, 1985, 1995, 2005) # Selected output years
)## 2043.35 sec elapsed
Note: This model run takes approximately 10 minutes to complete. For the purpose of this vignette, we’ll assume it has already been run.
Let’s examine the structure of the output:
The output includes:
age_focal: Age of the focal individualkin_stage: Stage (parity) of the relativessex_kin: Sex of the relativesyear: Calendar year of observationgroup: Type of relative (d = children, oa = older
aunts/uncles, etc.)count: Expected number of living relativescohort: Birth cohort of the focal individualLet’s visualize the distribution of older aunts and uncles by parity for different ages of Focal across different calendar years:
kin_out_1965_2005$kin_summary %>%
filter(group == "oa") %>%
ggplot(aes(x = age_focal, y = count, color = stage_kin, fill = stage_kin)) +
geom_bar(position = "stack", stat = "identity") +
facet_grid(sex_kin ~ year) +
scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
labs(
title = "Parity distribution of older aunts and uncles by Focal's age",
subtitle = "United Kingdom, 1965-2005",
x = "Age of focal individual",
y = "Number of older aunts and uncles",
fill = "Parity",
color = "Parity"
)Now let’s examine the distribution of children by parity:
kin_out_1965_2005$kin_summary %>%
filter(group == "d") %>%
ggplot(aes(x = age_focal, y = count, color = stage_kin, fill = stage_kin)) +
geom_bar(position = "stack", stat = "identity") +
facet_grid(sex_kin ~ year) +
scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
labs(
title = "Parity distribution of children by Focal's age",
subtitle = "United Kingdom, 1965-2005",
x = "Age of focal individual",
y = "Number of children",
fill = "Parity",
color = "Parity"
)We can also examine the kinship networks of specific birth cohorts. Let’s compare the offspring parity distributions for three different cohorts (1910, 1925, and 1965):
kin_out_1965_2005$kin_summary %>%
filter(group == "d", cohort %in% c(1910, 1925, 1965)) %>%
ggplot(aes(x = age_focal, y = count, color = stage_kin, fill = stage_kin)) +
geom_bar(position = "stack", stat = "identity") +
facet_grid(sex_kin ~ cohort) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
labs(
title = "Parity distribution of offspring by birth cohort",
subtitle = "United Kingdom, ages observed between 1965-2005",
x = "Age of focal individual",
y = "Number of offspring",
fill = "Parity",
color = "Parity"
)Interpretation:
For more detailed analysis, we can examine the age and parity distribution of specific relatives. Let’s look at the younger siblings of a 50-year-old Focal across different years:
kin_out_1965_2005$kin_full %>%
filter(group == "ys",
age_focal == 50) %>%
ggplot(aes(x = age_kin, y = count, color = stage_kin, fill = stage_kin)) +
geom_bar(position = "stack", stat = "identity") +
facet_grid(sex_kin ~ year) +
scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
labs(
title = "Age and parity distribution of younger siblings",
subtitle = "For a 50-year-old focal individual, United Kingdom, 1965-2005",
x = "Age of sibling",
y = "Number of younger siblings",
fill = "Parity",
color = "Parity"
)Notice the discontinuity along the x-axis at age 50. This reflects the fact that younger siblings cannot be older than Focal (by definition). Similarly, when we examine older siblings, we’ll see they cannot be younger than Focal.
With a simple manipulation of the output data frame, we can also plot the age and parity distribution of all siblings combined:
kin_out_1965_2005$kin_full %>%
filter((group == "ys" | group == "os"),
age_focal == 50) %>%
pivot_wider(names_from = group, values_from = count) %>%
mutate(count = `ys` + `os`) %>%
ggplot(aes(x = age_kin, y = count, color = stage_kin, fill = stage_kin)) +
geom_bar(position = "stack", stat = "identity") +
facet_grid(sex_kin ~ year) +
scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
labs(
title = "Age and parity distribution of all siblings",
subtitle = "For a 50-year-old focal individual, United Kingdom, 1965-2005",
x = "Age of sibling",
y = "Number of siblings",
fill = "Parity",
color = "Parity"
)In our second example, we’ll use educational attainment as our stage variable. We’ll use data from Singapore ranging from 2020 to 2090, sourced from the Wittgenstein Center. The data is aggregated into 5-year age groups and 5-year time intervals.
Some simplifying assumptions we make due to data availability:
Let’s load the pre-processed Singapore data:
This data includes:
U_mat_fem_edu/U_mat_male_edu: Lists of
matrices containing survival probabilities by age (rows) and education
(columns) from 2020-2090F_mat_fem_edu/F_mat_male_edu: Lists of
matrices containing fertility rates by age and educationT_mat_fem_edu/T_mat_male_edu: Lists of
transition matrices showing probabilities of moving between education
statesH_mat_edu: List of matrices that redistribute newborns
to age-class 1 and “no education” categoryBefore running the model, let’s examine some trends in the data:
# Calculate and plot Total Fertility Rate by education level over time
tfr_data <- lapply(seq_along(F_mat_fem_edu), function(i) {
col_sums <- colSums(F_mat_fem_edu[[i]])
data.frame(
year = 2020 + (i - 1) * 5,
education = factor(colnames(F_mat_fem_edu[[i]]),
levels = c("e1", "e2", "e3", "e4", "e5", "e6"),
labels = c("no education", "incomplete primary",
"primary", "lower secondary",
"upper secondary", "post-secondary")),
tfr = col_sums
)
})
tfr_df <- do.call(rbind, tfr_data)
# Plot TFR trends
ggplot(tfr_df, aes(x = year, y = tfr, color = education, group = education)) +
geom_line(size = 1) +
geom_point() +
theme_minimal() +
labs(
title = "Total Fertility Rate by Education Over Time",
x = "Year",
y = "TFR",
color = "Education"
)Now let’s implement the two-sex time-varying multi-state model with education as the stage variable:
# Define time period and parameters
time_range <- seq(2020, 2090, 5)
no_years <- length(time_range) - 1
output_year_select <- seq(1, no_years + 1, 1)
# Run the model
kin_out_2020_2090 <-
kin_multi_stage_time_variant_2sex(
U_list_females = U_mat_fem_edu[1:(1+no_years)], # Female survival matrices
U_list_males = U_mat_male_edu[1:(1+no_years)], # Male survival matrices
F_list_females = F_mat_fem_edu[1:(1+no_years)], # Female fertility matrices
F_list_males = F_mat_male_edu[1:(1+no_years)], # Male fertility matrices
T_list_females = T_mat_fem_edu[1:(1+no_years)], # Female transition matrices
T_list_males = T_mat_fem_edu[1:(1+no_years)], # Male transition matrices
H_list = H_mat_edu[1:(1+no_years)], # Birth redistribution matrices
birth_female = 1/(1.06+1), # Singapore sex ratio (48.5% female)
parity = FALSE, # Stages represent education
summary_kin = TRUE, # Produce summary statistics
sex_Focal = "Female", # Focal is female
initial_stage_Focal = 1, # Focal starts with no education
output_years = output_year_select # All years
)## 148.29 sec elapsed
Note: This model run takes approximately 5 minutes to complete. For the purpose of this vignette, we’ll assume it has already been run.
Now we need to recode the stage variables to show meaningful educational labels and convert years/ages to real values (since we used 5-year intervals):
# Recode year and age variables to show correct values
kin_out_2020_2090$kin_summary <-
kin_out_2020_2090$kin_summary %>%
mutate(year = (year-1)*5+min(time_range),
age_focal = age_focal*5,
cohort = year - age_focal,
stage_kin = factor(stage_kin, levels = c(1, 2, 3, 4, 5, 6),
labels = c(
"no education",
"incomplete primary",
"primary",
"lower secondary",
"upper secondary",
"post-secondary"
)))
kin_out_2020_2090$kin_full <-
kin_out_2020_2090$kin_full %>%
mutate(year = (year-1)*5+min(time_range),
age_focal = age_focal*5,
age_kin = age_kin*5,
cohort = year - age_focal,
stage_kin = factor(stage_kin, levels = c(1, 2, 3, 4, 5, 6),
labels = c(
"no education",
"incomplete primary",
"primary",
"lower secondary",
"upper secondary",
"post-secondary"
)))First, let’s visualize the total number of living kin by educational attainment for a woman born in 2020 in Singapore:
kin_out_2020_2090$kin_summary %>%
# Exclude Focal from the analysis
filter(group != "Focal") %>%
filter(cohort == 2020) %>%
rename(kin = group) %>%
rename_kin(sex = "2sex") %>%
summarise(count = sum(count), .by = c(stage_kin, age_focal)) %>%
ggplot(aes(x = age_focal, y = count, fill = stage_kin)) +
geom_area(colour = "black") +
labs(
title = "Total living kin by educational attainment over the life course",
subtitle = "For a woman born in 2020, Singapore",
y = "Expected number of living kin",
x = "Age of focal individual",
fill = "Educational attainment"
) +
theme_bw() +
theme(legend.position = "bottom")Now let’s look at specific types of relatives:
kin_out_2020_2090$kin_summary %>%
filter(group %in% c("d", "gd", "m", "gm")) %>%
filter(cohort == 2020) %>%
rename(kin = group) %>%
rename_kin(sex = "2sex") %>%
summarise(count = sum(count), .by = c(kin_label, stage_kin, age_focal)) %>%
ggplot(aes(x = age_focal, y = count, fill = stage_kin)) +
geom_area(colour = "black") +
labs(
title = "Living kin by educational attainment over the life course",
subtitle = "For a woman born in 2020, Singapore",
y = "Expected number of living kin",
x = "Age of focal individual",
fill = "Educational attainment"
) +
facet_wrap(. ~ kin_label) +
theme_bw() +
theme(legend.position = "bottom")Let’s examine the age and educational distribution of key relatives when the focal individual is 60 years old:
kin_out_2020_2090$kin_full %>%
filter(
group %in% c("d", "gd", "m", "gm"),
age_focal == 60,
cohort == 2020
) %>%
rename(kin = group) %>%
rename_kin("2sex") %>%
ggplot(aes(x = age_kin, y = count, color = stage_kin, fill = stage_kin)) +
geom_bar(position = "stack", stat = "identity") +
scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100)) +
labs(
title = "Age and educational distribution of key relatives",
subtitle = "For a 60-year-old woman born in 2020, Singapore",
x = "Age of relative",
y = "Number of relatives",
fill = "Educational attainment",
color = "Educational attainment"
) +
facet_wrap(. ~ kin_label) +
theme_bw() +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5),
legend.position = "bottom"
)Now let’s examine how the educational composition of specific relatives changes over time. First, let’s look at the educational attainment of descendants (children and grandchildren) of 60-year-old women across different calendar years:
kin_out_2020_2090$kin_summary %>%
filter(
group %in% c("d", "gd"),
age_focal == 60
) %>%
rename(kin = group) %>%
rename_kin(sex = "2sex") %>%
summarise(count = sum(count), .by = c(stage_kin, kin_label, year)) %>%
ggplot(aes(x = year, y = count, fill = stage_kin)) +
geom_area(colour = "black") +
labs(
title = "Educational attainment of descendants over time",
subtitle = "For women aged 60, Singapore, 2020-2090",
y = "Expected number of descendants",
x = "Year",
fill = "Educational attainment"
) +
facet_grid(. ~ kin_label) +
theme_bw() +
theme(legend.position = "bottom")This visualization shows how educational expansion transforms the composition of family networks over time, with increasing proportions of higher-educated relatives among descendants.
When implementing these models, we’ve made several simplifying assumptions due to data limitations:
For the UK parity model:
For the Singapore education model:
These assumptions should be kept in mind when interpreting the results.
In this vignette, we’ve implemented two-sex time-varying multi-state kinship models—the most comprehensive demographic kinship framework currently available. By integrating age, sex, time, and stage dimensions, these models provide unprecedented insights into family structures and their evolution.
Our two examples demonstrated the analytical power of this integrated approach:
These comprehensive models have numerous applications across disciplines:
While implementing these models requires substantial data and computational resources, approximation methods can be useful when data is limited. Despite their complexity, these integrated models provide essential tools for understanding contemporary family dynamics in all their multidimensional complexity.
As societies continue to transform through demographic, educational, and social changes, these sophisticated modeling approaches help us anticipate emerging kinship patterns and develop responsive policies for societies in transition. By capturing the full demographic complexity of family formation, they represent the frontier of formal kinship modeling and offer valuable insights for both research and policy.